description |
Web has brought forth opportunities to connect information sources
across all types of boundaries. The information sources include
databases, XML documents, and other unstructured sources. Data
integration is to combine data residing at different sources and
providing the user with a unified view of these data. Currently
users are expecting more efficient services from such data
integration systems. Indeed, querying multiple data sources
scattered on the web encounters many barriers for achieving
efficiency due to the heterogeneity and autonomy of the information
sources. This paper describes a query optimizer, which uses
constraints to semantically optimize the queries. The optimizer
first translates constraints from data sources into constraints
expressed at the global level, e.g., in the common schema, and
stores them in the constraint repository, again, at the global
level. Then the optimizer can use semantic query optimization
technologies including detection of empty results, join elimination,
and predicate elimination to generate a more efficient but
semantically equivalent query for the user. The optmizer is
published as a web service and can be invoked by many data
integration systems. We carry out experiments using our semantic
query optimizer and first results show that performance can be
greatly improved.
|